Recommending Recipients in the Enron Email Corpus

نویسندگان

  • Vitor R. Carvalho
  • William W. Cohen
چکیده

Email is the most popular communication tool of the internet. In this paper we investigate how email systems can be enhanced to work as recipient recommendation systems, i.e., suggesting who recipients of a message might be, while the message is being composed, given its current contents and given its previously-specified recipients. This can be a valuable addition to email clients, particularly in large corporations. It can be used to identify people in an organization that are working in a similar topic or project, or to find people with appropriate expertise or skills. Recipient recommendation can also prevent a user from forgetting to add an important collaborator or manager as recipient, preventing costly misunderstandings and communication delays. In this paper we present the first study of recipient recommendation in a real large-scale corporate email collection, the Enron Email corpus. We begin by defining the problem as a large multi-class multi-label classification task, where each email can be addressed to multiple recipients in the user’s address book (i.e., each class is equivalent to an email address in the address book). We propose various baselines to the problem, along with a classification-based reranking scheme to combine two types of features: textual contents and network information from the email headers. Experiments indicate that the reranking scheme significantly outperforms the baselines, and that the best scheme is accurate enough to be useful in email clients. Results are encouraging also because the proposed solution can be easily implemented in any email client – with no changes in the email server side.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CC Prediction in the Enron Corpus

Email is the most popular communication tool on the web. To improve the way we handle email messages, machine learning techniques have been proposed in different areas, from adaptive spam filtering to automated message foldering (i.e., predicting the correct folder to store a message). One of the ideas recently proposed is to automatically predict the recipients of an already composed message; ...

متن کامل

Preventing Information Leaks in Email

The widespread use of email has raised serious privacy concerns. A critical issue is how to prevent email information leaks, i.e., when a message is accidentally addressed to non-desired recipients. This is an increasingly common problem that can severely harm individuals and corporations — for instance, a single email leak can potentially cause expensive law suits, brand reputation damage, neg...

متن کامل

Network Analysis with the Enron Email Corpus

We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article’s focus on centrality, students can explore the dependence of statistical models on initial...

متن کامل

Email Formality in the Workplace: A Case Study on the Enron Corpus

Email is an important way of communication in our daily life and it has become the subject of various NLP and social studies. In this paper, we focus on email formality and explore the factors that could affect the sender’s choice of formality. As a case study, we use the Enron email corpus to test how formality is affected by social distance, relative power, and the weight of imposition, as de...

متن کامل

Work Hard, Play Hard: Email Classification on the Avocado and Enron Corpora

In this paper, we present an empirical study of email classification into two main categories “Business” and “Personal”. We train on the Enron email corpus, and test on the Enron and Avocado email corpora. We show that information from the email exchange networks improves the performance of classification. We represent the email exchange networks as social networks with graph structures. For th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1972